204 research outputs found
Inference of Markovian Properties of Molecular Sequences from NGS Data and Applications to Comparative Genomics
Next Generation Sequencing (NGS) technologies generate large amounts of short
read data for many different organisms. The fact that NGS reads are generally
short makes it challenging to assemble the reads and reconstruct the original
genome sequence. For clustering genomes using such NGS data, word-count based
alignment-free sequence comparison is a promising approach, but for this
approach, the underlying expected word counts are essential.
A plausible model for this underlying distribution of word counts is given
through modelling the DNA sequence as a Markov chain (MC). For single long
sequences, efficient statistics are available to estimate the order of MCs and
the transition probability matrix for the sequences. As NGS data do not provide
a single long sequence, inference methods on Markovian properties of sequences
based on single long sequences cannot be directly used for NGS short read data.
Here we derive a normal approximation for such word counts. We also show that
the traditional Chi-square statistic has an approximate gamma distribution,
using the Lander-Waterman model for physical mapping. We propose several
methods to estimate the order of the MC based on NGS reads and evaluate them
using simulations. We illustrate the applications of our results by clustering
genomic sequences of several vertebrate and tree species based on NGS reads
using alignment-free sequence dissimilarity measures. We find that the
estimated order of the MC has a considerable effect on the clustering results,
and that the clustering results that use a MC of the estimated order give a
plausible clustering of the species.Comment: accepted by RECOMB-SEQ 201
Comparison of metagenomic samples using sequence signatures
BACKGROUND: Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams), have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS) read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied. RESULTS: We studied several dissimilarity measures, including d(2), d(2)(*) and d(2)(S) recently developed from our group, a measure (hereinafter noted as Hao) used in CVTree developed from Hao’s group (Qi et al., 2004), measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009), as well as standard l(p) measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS) metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d(2)(S) can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples are obtained through the analyses. Our results show that sequence signatures of the mammalian gut are closely associated with diet and gut physiology of the mammals, and that sequence signatures of marine communities are closely related to location and temperature. CONCLUSIONS: Sequence signatures can successfully reveal major group and gradient relationships among metagenomic samples from NGS reads without alignment to reference databases. The d(2)(S) dissimilarity measure is a good choice in all application scenarios. The optimal choice of tuple size depends on sequencing depth, but it is quite robust within a range of choices for moderate sequencing depths
Responses of soil nitrogen mineralization to temperature and moisture in alpine ecosystems on the Tibetan Plateau
AbstractThe responses of soil net nitrogen (N) mineralization to temperature and moisture were investigated in four alpine ecosystems of forest, shrub, meadow and steppe by laboratory incubation method with undisturbed soil cores on the Tibetan Plateau. The results indicated the soil net N mineralization varies greatly between alpine ecosystems. The soil net N mineralization rate in three incubating moisture of forest ecosystem rose markedly, and that of meadow ecosystem rose gently from temperature of 5°C to 35°C, while that of shrub and steppe ecosystems increased from temperature of 5°C to 25°C and reduced from temperature of 25°C to 35°C. At the same incubating temperature, the soil net N mineralization of four alpine ecosystems increased in the middle moisture and deceased in the low or high moisture
The impact of atmospheric N deposition and N fertilizer type on soil nitric oxide and nitrous oxide fluxes from agricultural and forest Eutric Regosols
Agricultural and forest soils with low organic C content and high alkalinity were studied over 17 days to investigate the potential response of the atmospheric pollutant nitric oxide (NO) and the greenhouse gas nitrous oxide (N2O) on (1) increased N deposition rates to forest soil; (2) different fertilizer types to agricultural soil and (3) a simulated rain event to forest and agricultural soils. Cumulative forest soil NO emissions (148–350 ng NO-N g−1) were ~ 4 times larger than N2O emissions (37–69 ng N2O-N g−1). Contrary, agricultural soil NO emissions (21–376 ng NO-N g−1) were ~ 16 times smaller than N2O emissions (45–8491 ng N2O-N g−1). Increasing N deposition rates 10 fold to 30 kg N ha−1 yr−1, doubled soil NO emissions and NO3− concentrations. As such high N deposition rates are not atypical in China, more attention should be paid on forest soil NO research. Comparing the fertilizers urea, ammonium nitrate, and urea coated with the urease inhibitor ‘Agrotain®,’ demonstrated that the inhibitor significantly reduced NO and N2O emissions. This is an unintended, not well-known benefit, because the primary function of Agrotain® is to reduce emissions of the atmospheric pollutant ammonia. Simulating a climate change event, a large rainfall after drought, increased soil NO and N2O emissions from both agricultural and forest soils. Such pulses of emissions can contribute significantly to annual NO and N2O emissions, but currently do not receive adequate attention amongst the measurement and modeling communities
Recommended from our members
A comprehensive analysis and source apportionment of metals in riverine sediments of a rural-urban watershed.
Quantitative assessment of metal sources in sediments is essential for implementation of source control and remediation strategies. This study investigated metal contamination in sediments to assess potential ecological risks and quantify pollutant sources of metals (Cu, Zn, Pb, Cd, Cr, Co and Ni) in the Wen-Rui Tang River watershed. Total and fraction analysis indicated high pollution levels of metals. Zinc and Cd posed high ecological risk based on the risk assessment code, with the highest ecological risk found in the southwestern of the watershed. The positive matrix factorization (PMF) model was highly effective in predicting total metal concentrations and identified three contributing metal sources. An agricultural source (factor 1) contributed highly to Cu (74.1%) and Zn (42.5%), and was most prominent in the west and south-central portions of the watershed. Cd (93.5%) showed a high weighting with industrial sources (factor 2) with a hot spot in the southwest. Factor 3 was identified as a mixed natural and vehicle traffic source that showed large contribution to Cr (65.2%), Ni (63.9%) and Pb (50.7%). Spatial analysis indicated a consistent pattern between PMF-identified factors and suspected metal sources at the watershed scale demonstrating the efficacy of the PMF modeling approach for watershed analysis
Liao ning virus in China
<p>Abstract</p> <p>Background</p> <p>Liao ning virus is in the genus Seadornavirus within the family Reoviridae and has a genome composed of 12 segments of double-stranded RNA (dsRNA). It is transmitted by mosquitoes and only isolated in China to date and it is the only species within the genus Seadornavirus which was reported to have been propagated in mammalian cell lines. In the study, we report 41 new isolates from northern and southern Xinjiang Uygur autonomous region in China and describe the phylogenetic relationships among all 46 Chinese LNV isolates.</p> <p>Findings</p> <p>The phylogenetic analysis indicated that all the isolates evaluated in this study can be divided into 3 different groups that appear to be related to geographic origin based on partial nucleotide sequence of the 10th segment which is predicted to encode outer coat proteins of LNV. Bayesian coalescent analysis estimated the date of the most recent common ancestor for the current Chinese LNV isolates to be 318 (with a 95% confidence interval of 30-719) and the estimated evolutionary rates is 1.993 × 10<sup>-3 </sup>substitutions per site per year.</p> <p>Conclusions</p> <p>The results indicated that LNV may be an emerging virus at a stage that evaluated rapidly and has been widely distributed in the north part of China.</p
- …